39 research outputs found

    SOMA A Tool for Synthesizing and Optimizing Memory Accesses in ASICs

    Get PDF
    Arbitrary memory dependencies and variable latency memory systems are major obstacles to the synthesis of large-scale ASIC systems in high-level synthesis. This paper presents SOMA, a synthesis framework for constructing Memory Access Network (MAN) architectures that inherently enforce memory consistency in the presence of dynamic memory access dependencies. A fundamental bottleneck in any such network is arbitrating between concurrent accesses to a shared memory resource. To alleviate this bottleneck, SOMA uses an application-specific concurrency analysis technique to predict the dynamic memory parallelism profile of the application. This is then used to customize the MAN architecture. Depending on the parallelism profile, the MAN may be optimized for latency, throughput or both. The optimized MAN is automatically synthesized into gate-level structural Verilog using a flexible library of network building blocks. SOMA has been successfully integrated into an automated C-to-hardware synthesis flow, which generates standard cell circuits from unrestricted ANSI-C programs. Post-layout experiments demonstrate that application specific MAN construction significantly improves power and performance

    Factors Influencing the Performance of a CPU-RFU Hybrid Architecture

    Get PDF
    Em face da acentuada revalorização da textualidade nas poéticas da década de 1960 em Portugal – perspectiva que, com algumas especificidades, também se verifica no contexto francês e espanhol –, a demarcação dos poetas emergentes na década seguinte é, por vezes, fortemente reactiva. Mas haverá uma diferença essencial entre estas duas inflexões, corporizadas em poéticas aparentemente distintas? E haverá algum momento, na segunda metade do século XX, em que efectivamente se concretize uma ruptura? No presente estudo, procura-se mostrar que, mais do que produzir uma ruptura, as poéticas emergentes nos anos sessenta do século XX consolidam uma tradição de modernidade escolhendo a sua vertente mais radical, enquanto as poéticas subsequentes preferem reatar a tradição mais remota da modernidade, em sentido baudelairiano. Apesar de estarmos perante dois diálogos diferentes com a tradição, é possível observar que, em ambos os casos, esta é retomada a um ponto que nos impede de falarmos de ruptura

    Characterization of memory T cell subsets and common γ−chain cytokines in convalescent COVID-19 individuals

    Get PDF
    T cells are thought to be an important correlates of protection against SARS‐CoV2 infection. However, the composition of T cell subsets in convalescent individuals of SARS‐CoV2 infection has not been well studied. The authors determined the lymphocyte absolute counts, the frequency of memory T cell subsets, and the plasma levels of common γ−chain in 7 groups of COVID‐19 individuals, based on days since RT‐PCR confirmation of SARS‐CoV‐2 infection. The data show that both absolute counts and frequencies of lymphocytes as well as, the frequencies of CD4(+) central and effector memory cells increased, and the frequencies of CD4(+) naïve T cells, transitional memory, stem cell memory T cells, and regulatory cells decreased from Days 15–30 to Days 61–90 and plateaued thereafter. In addition, the frequencies of CD8(+) central memory, effector, and terminal effector memory T cells increased, and the frequencies of CD8(+) naïve cells, transitional memory, and stem cell memory T cells decreased from Days 15–30 to Days 61–90 and plateaued thereafter. The plasma levels of IL‐2, IL‐7, IL‐15, and IL‐21—common γc cytokines started decreasing from Days 15–30 till Days 151–180. Severe COVID‐19 patients exhibit decreased levels of lymphocyte counts and frequencies, higher frequencies of naïve cells, regulatory T cells, lower frequencies of central memory, effector memory, and stem cell memory, and elevated plasma levels of IL‐2, IL‐7, IL‐15, and IL‐21. Finally, there was a significant correlation between memory T cell subsets and common γc cytokines. Thus, the study provides evidence of alterations in lymphocyte counts, memory T cell subset frequencies, and common γ−chain cytokines in convalescent COVID‐19 individuals

    Operation chaining asynchronous pipelined circuits

    No full text
    Abstract — We define operation chaining (op-chaining) as an optimization problem to determine the optimal pipeline depth for balancing performance against energy demands in pipelined asynchronous designs. Since there are no clock period requirements, asynchronous pipeline stages can have non-uniform latencies. We exploit this fact to coalesce several stages together thereby saving power and area due to the elimination of control-path resources from the pipeline. The trade-off is potentially reduced pipeline parallelism. In this paper, we formally define this optimization as a graph covering problem, which finds sub-graphs that will be synthesized as an opchained pipeline stage. We then define the solution space for provably correct solutions and present an algorithm to efficiently search this space. The search technique partitions the graph based on post-dominator relationships to find sub-graphs that are potential op-chain candidates. We use knowledge of the Global Critical Path (GCP) [13] to evaluate the performance impact of accepting a candidate sub-graph and formulate a heuristic cost function to model this trade-off. The algorithm has a quadratic-time complexity in the size of the dataflow graph. We have implemented this algorithm within an automated asynchronous synthesis toolchain [12]. Experimental evidence from applying the algorithm on several media processing kernels reveals that the average energydelay and energy-delay-area products improve by about 1.4x and 1.8x respectively, with a maximum improvement of 5x and 18x. I

    A compiler framework for mapping applications to a coarse-grained reconfigurable computer architecture

    No full text
    The rapid growth of silicon densities has made it feasible to deploy reconfigurable hardware as a highly parallel computing platform. However, in most cases, the application needs to be programmed in hardware description or assembly languages, whereas most application programmers are familiar with the algorithmic programming paradigm. SA-C has been proposed as an expression-oriented language designed to implicitly express data parallel operations. Morphosys is a reconfigurable system-on-chip architecture that supports a data-parallel, SIMD computational model. This paper describes a compiler framework to analyze SA-C programs, perform optimizations, and map the application onto the Morphosys architecture. The mapping process involves operation scheduling, resource allocation and binding and register allocation in the context of the Morphosys architecture. The execution times of some compiled image-processing kernels can achieve up to 42x speed-up over an 800 MHz Pentium III machine. 1

    Slack Analysis in the System Design Loop

    No full text
    ABSTRACT We present a system-level technique to analyze the impact of design optimizations on system-level timing dependencies. This technique enables us to speed up the design cycle by substituting, in the design the loop, the time-consuming simulation step with a fast timing update routine. As a result, we can significantly reduce the design time from on the order of hours/days to the order of seconds/minutes. The update algorithm is defined on the Transaction Level Model (TLM) and can be used by any design flow that invokes TLM-based optimizations. This algorithm has lineartime complexity in the program size and experimental results indicate that any loss of accuracy due to this technique is negligible (< ±1%); the benefit is a reduction in total design cycle time from several hours to a matter of seconds

    Leveraging protocol knowledge in slack matching

    No full text
    Stalls, due to mis-matches in communication rates, are a major performance obstacle in pipelined circuits. If the rate of data production is faster than the rate of consumption, the resulting design performs slower than when the communication rate is matched. This can be remedied by inserting pipeline buffers (to temporarily hold data), allowing the producer to proceed if the consumer is not ready to accept data. The problem of deciding which channels need these buffers (and how many) for an arbitrary communication profile is called the slack matching problem; the optimal solution to this problem has been shown to be NP-complete. In this paper, we present a heuristic that uses knowledge of the communication protocol to explicitly model these bottlenecks, and an iterative algorithm to progressively remove these bottlenecks by inserting buffers. We apply this algorithm to asynchronous circuits, and show that it naturally handles large designs with arbitrarily cyclic and acyclic topologies, which exhibit various types of control choice. The heuristic is efficient, achieving linear time complexity in practice, and produces solutions that (a) achieve up to 60 % performance speedup on large media processing kernels, and (b) can either be verified to be optimal, or the approximation margin can be bounded. 1
    corecore